clean image
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Europe > Austria > Vienna (0.14)
- (6 more...)
- Government (0.95)
- Information Technology > Security & Privacy (0.69)
- Transportation > Ground (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods
Supervised training of deep neural networks on pairs of clean image and noisy measurement achieves state-of-the-art performance for many image reconstruction tasks, but such training pairs are difficult to collect. Self-supervised methods enable training based on noisy measurements only, without clean images. In this work, we investigate the cost of self-supervised training in terms of sample complexity for a class of self-supervised methods that enable the computation of unbiased estimates of gradients of the supervised loss, including noise2noise methods. We analytically show that a model trained with such self-supervised training is as good as the same model trained in a supervised fashion, but self-supervised training requires more examples than supervised training. We then study self-supervised denoising and accelerated MRI empirically and characterize the cost of self-supervised training in terms of the number of additional samples required, and find that the performance gap between self-supervised and supervised training vanishes as a function of the training examples, at a problem-dependent rate, as predicted by our theory.
Improving robustness to corruptions with multiplicative weight perturbations
Deep neural networks (DNNs) excel on clean images but struggle with corrupted ones. Incorporating specific corruptions into the data augmentation pipeline can improve robustness to those corruptions but may harm performance on clean images and other types of distortion. In this paper, we introduce an alternative approach that improves the robustness of DNNs to a wide range of corruptions without compromising accuracy on clean images. We first demonstrate that input perturbations can be mimicked by multiplicative perturbations in the weight space. Leveraging this, we propose Data Augmentation via Multiplicative Perturbation (DAMP), a training method that optimizes DNNs under random multiplicative weight perturbations. We also examine the recently proposed Adaptive Sharpness-Aware Minimization (ASAM) and show that it optimizes DNNs under adversarial multiplicative weight perturbations. Experiments on image classification datasets (CIFAR-10/100, TinyImageNet and ImageNet) and neural network architectures (ResNet50, ViT-S/16, ViT-B/16) show that DAMP enhances model generalization performance in the presence of corruptions across different settings. Notably, DAMP is able to train a ViT-S/16 on ImageNet from scratch, reaching the top-1 error of 23.7% which is comparable to ResNet50 without extensive data augmentations.
Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models
Martin, Michael R., Chan, Garrick, Ma, Kwan-Liu
Recent image protection mechanisms such as Glaze and Nightshade introduce imperceptible, adversarially designed perturbations intended to disrupt downstream text-to-image generative models. While their empirical effectiveness has been demonstrated, the internal structure, detectability, and representational behavior of these perturbations remain poorly understood. In this study, we demonstrated a systematic explainable AI analysis of image protection perturbations using a unified framework that integrates white-box feature-space inspection and black-box signal-level probing. Through latent-space clustering, feature-channel activation analysis, occlusion-based spatial sensitivity mapping, and frequency-domain spectral characterization, we revealed that modern protection mechanisms operate as structured, low-entropy perturbations that remain tightly coupled to underlying image content across representational, spatial, and spectral domains in all evaluated cases. We showed that protected images preserve content-driven feature organization with protection-specific substructure rather than inducing global representational drift. Detectability is governed by interacting effects of perturbation entropy, spatial deployment, and frequency alignment as revealed through combined synthetic and spectral analyses, with sequential protection amplifying detectable structure rather than suppressing it. Frequency-domain analysis further demonstrated that Glaze and Nightshade redistribute energy along dominant image-aligned frequency axes rather than introducing spectrally diffuse noise. These results suggested that contemporary image protection operates through structured feature-level deformation rather than semantic dislocation, providing mechanistic insight into why protection signals remain visually subtle yet consistently detectable. This work advances the interpretability of adversarial image protection and informs the design of future defenses and detection strategies for generative AI systems.
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables
Singh, Anshul, Chaudhary, Rohan, Singh, Gagneet, Kumary, Abhay
The impressive performance of VLMs is largely measured on benchmarks that fail to capture the complexities of real-world scenarios. Existing datasets for tabular QA, such as WikiTableQuestions and FinQA, are overwhelmingly monolingual (English) and present tables in a digitally perfect, clean format. This creates a significant gap between research and practice. To address this, we present \textbf{MirageTVQA}, a new benchmark designed to evaluate VLMs on these exact dimensions. Featuring nearly 60,000 QA pairs across 24 languages, MirageTVQA challenges models with tables that are not only multilingual but also visually imperfect, incorporating realistic noise to mimic scanned documents. Our evaluation of the leading VLMs reveals two primary failure points: a severe degradation in performance (over 35\% drop for the best models) when faced with visual noise and a consistent English-first bias where reasoning abilities fail to transfer to other languages. MirageTVQA provides a benchmark for measuring and driving progress towards more robust VLM models for table reasoning. The dataset and the code are available at: https://github.com/anshulsc/MirageTVQA.
- Asia > India > Chandigarh (0.05)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- Asia > India > Karnataka > Bengaluru (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.53)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (5 more...)
- Information Technology (0.68)
- Education (0.46)
- North America (0.14)
- Asia > Middle East > Israel (0.04)
- Research Report > Experimental Study (0.93)
- Research Report > New Finding (0.67)
- North America > United States > Pennsylvania (0.04)
- Asia > China > Zhejiang Province (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)